Overview

Dataset statistics

Number of variables10
Number of observations462
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.2 KiB
Average record size in memory80.3 B

Variable types

NUM8
BOOL2

Reproduction

Analysis started2020-08-25 01:48:27.925074
Analysis finished2020-08-25 01:48:38.117122
Duration10.19 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Tobacco has 107 (23.2%) zeros Zeros
Alcohol has 110 (23.8%) zeros Zeros

Variables

Sbp
Real number (ℝ≥0)

Distinct count62
Unique (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.32683982683983
Minimum101
Maximum218
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:38.162787image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile112
Q1124
median134
Q3148
95-th percentile176
Maximum218
Range117
Interquartile range (IQR)24

Descriptive statistics

Standard deviation20.49631718
Coefficient of variation (CV)0.1481731037
Kurtosis1.781646545
Mean138.3268398
Median Absolute Deviation (MAD)12
Skewness1.180590625
Sum63907
Variance420.0990178
2020-08-25T01:48:38.271656image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
134296.3%
 
136296.3%
 
128255.4%
 
132245.2%
 
124214.5%
 
118214.5%
 
126204.3%
 
130204.3%
 
138183.9%
 
122173.7%
 
120143.0%
 
142143.0%
 
148122.6%
 
140122.6%
 
114122.6%
 
146112.4%
 
144102.2%
 
154102.2%
 
16291.9%
 
16091.9%
 
15281.7%
 
11681.7%
 
16681.7%
 
10871.5%
 
11271.5%
 
Other values (37)8718.8%
 
ValueCountFrequency (%) 
10110.2%
 
10210.2%
 
10310.2%
 
10630.6%
 
10871.5%
 
10910.2%
 
11040.9%
 
11271.5%
 
114122.6%
 
11681.7%
 
ValueCountFrequency (%) 
21810.2%
 
21610.2%
 
21410.2%
 
20830.6%
 
20620.4%
 
20010.2%
 
19810.2%
 
19420.4%
 
19020.4%
 
18810.2%
 

Tobacco
Real number (ℝ≥0)

ZEROS

Distinct count214
Unique (%)46.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.635649350649351
Minimum0.0
Maximum31.2
Zeros107
Zeros (%)23.2%
Memory size3.7 KiB
2020-08-25T01:48:38.386073image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.0525
median2
Q35.5
95-th percentile12.49
Maximum31.2
Range31.2
Interquartile range (IQR)5.4475

Descriptive statistics

Standard deviation4.593024078
Coefficient of variation (CV)1.263329776
Kurtosis5.968107866
Mean3.635649351
Median Absolute Deviation (MAD)2
Skewness2.079209667
Sum1679.67
Variance21.09587018
2020-08-25T01:48:38.498424image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
010723.2%
 
6112.4%
 
3102.2%
 
0.481.7%
 
481.7%
 
4.271.5%
 
4.571.5%
 
1.251.1%
 
0.651.1%
 
251.1%
 
1.551.1%
 
1251.1%
 
8.840.9%
 
0.1240.9%
 
5.640.9%
 
7.540.9%
 
5.540.9%
 
0.0540.9%
 
2.630.6%
 
1.830.6%
 
0.830.6%
 
2.830.6%
 
0.530.6%
 
0.2830.6%
 
10.530.6%
 
Other values (189)23450.6%
 
ValueCountFrequency (%) 
010723.2%
 
0.0110.2%
 
0.0210.2%
 
0.0310.2%
 
0.0420.4%
 
0.0540.9%
 
0.0610.2%
 
0.0710.2%
 
0.0820.4%
 
0.0910.2%
 
ValueCountFrequency (%) 
31.210.2%
 
27.410.2%
 
25.0110.2%
 
2020.4%
 
19.610.2%
 
19.4510.2%
 
19.210.2%
 
18.210.2%
 
1810.2%
 
1610.2%
 

Ldl
Real number (ℝ≥0)

Distinct count329
Unique (%)71.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.740324675324675
Minimum0.98
Maximum15.33
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:38.610106image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.98
5-th percentile2.1945
Q13.2825
median4.34
Q35.79
95-th percentile8.404
Maximum15.33
Range14.35
Interquartile range (IQR)2.5075

Descriptive statistics

Standard deviation2.070909161
Coefficient of variation (CV)0.4368707426
Kurtosis2.876552943
Mean4.740324675
Median Absolute Deviation (MAD)1.195
Skewness1.31310398
Sum2190.03
Variance4.288664753
2020-08-25T01:48:38.720066image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.9551.1%
 
4.3751.1%
 
3.5751.1%
 
2.440.9%
 
3.5840.9%
 
3.340.9%
 
4.1640.9%
 
3.1430.6%
 
1.8830.6%
 
4.930.6%
 
3.1730.6%
 
5.930.6%
 
4.1930.6%
 
3.7930.6%
 
6.0630.6%
 
2.4230.6%
 
3.6930.6%
 
5.6330.6%
 
2.2830.6%
 
4.8930.6%
 
4.5530.6%
 
3.1230.6%
 
3.9830.6%
 
2.4430.6%
 
4.7530.6%
 
Other values (304)37781.6%
 
ValueCountFrequency (%) 
0.9810.2%
 
1.0710.2%
 
1.4310.2%
 
1.5510.2%
 
1.5910.2%
 
1.7110.2%
 
1.7210.2%
 
1.7410.2%
 
1.7710.2%
 
1.810.2%
 
ValueCountFrequency (%) 
15.3310.2%
 
14.1610.2%
 
12.4210.2%
 
11.8910.2%
 
11.6110.2%
 
11.4110.2%
 
11.3210.2%
 
11.1710.2%
 
10.5810.2%
 
10.5310.2%
 

Adiposity
Real number (ℝ≥0)

Distinct count408
Unique (%)88.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.4067316017316
Minimum6.74
Maximum42.49
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:38.837531image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum6.74
5-th percentile12.0065
Q119.775
median26.115
Q331.2275
95-th percentile37.1165
Maximum42.49
Range35.75
Interquartile range (IQR)11.4525

Descriptive statistics

Standard deviation7.780698596
Coefficient of variation (CV)0.306245554
Kurtosis-0.6984386244
Mean25.4067316
Median Absolute Deviation (MAD)5.7
Skewness-0.2146459286
Sum11737.91
Variance60.53927064
2020-08-25T01:48:38.961803image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
30.7930.6%
 
29.330.6%
 
21.130.6%
 
27.5530.6%
 
30.8420.4%
 
9.6920.4%
 
27.6820.4%
 
23.5220.4%
 
28.1120.4%
 
20.4720.4%
 
37.8320.4%
 
31.2920.4%
 
34.4620.4%
 
15.8920.4%
 
18.9620.4%
 
23.0720.4%
 
25.7320.4%
 
17.3320.4%
 
24.6520.4%
 
32.0320.4%
 
24.8320.4%
 
29.1820.4%
 
16.3820.4%
 
12.1320.4%
 
23.8820.4%
 
Other values (383)40888.3%
 
ValueCountFrequency (%) 
6.7410.2%
 
7.1210.2%
 
8.6610.2%
 
9.2810.2%
 
9.3710.2%
 
9.3910.2%
 
9.6410.2%
 
9.6920.4%
 
9.7410.2%
 
10.0510.2%
 
ValueCountFrequency (%) 
42.4910.2%
 
42.1710.2%
 
42.0610.2%
 
41.0510.2%
 
40.610.2%
 
39.9710.2%
 
39.7110.2%
 
39.6810.2%
 
39.6610.2%
 
39.6410.2%
 

Famhist
Boolean

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
0
270
1
192
ValueCountFrequency (%) 
027058.4%
 
119241.6%
 

Typea
Real number (ℝ≥0)

Distinct count54
Unique (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.103896103896105
Minimum13
Maximum78
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:39.085149image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile36
Q147
median53
Q360
95-th percentile69
Maximum78
Range65
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.817534116
Coefficient of variation (CV)0.1848740834
Kurtosis0.4704023399
Mean53.1038961
Median Absolute Deviation (MAD)6
Skewness-0.3464377547
Sum24534
Variance96.38397611
2020-08-25T01:48:39.199218image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
52255.4%
 
57235.0%
 
50214.5%
 
54214.5%
 
49204.3%
 
56183.9%
 
60183.9%
 
61173.7%
 
47173.7%
 
55173.7%
 
45163.5%
 
46163.5%
 
51153.2%
 
48143.0%
 
58143.0%
 
53143.0%
 
42132.8%
 
59132.8%
 
63112.4%
 
64112.4%
 
65112.4%
 
6291.9%
 
6691.9%
 
4191.9%
 
6971.5%
 
Other values (29)8318.0%
 
ValueCountFrequency (%) 
1310.2%
 
2010.2%
 
2510.2%
 
2610.2%
 
2810.2%
 
2910.2%
 
3020.4%
 
3120.4%
 
3210.2%
 
3340.9%
 
ValueCountFrequency (%) 
7810.2%
 
7710.2%
 
7510.2%
 
7420.4%
 
7320.4%
 
7240.9%
 
7120.4%
 
7051.1%
 
6971.5%
 
6861.3%
 

Obesity
Real number (ℝ≥0)

Distinct count400
Unique (%)86.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.04411255411255
Minimum14.7
Maximum46.58
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:39.317644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum14.7
5-th percentile20.17
Q122.985
median25.805
Q328.4975
95-th percentile33.138
Maximum46.58
Range31.88
Interquartile range (IQR)5.5125

Descriptive statistics

Standard deviation4.213680227
Coefficient of variation (CV)0.161790125
Kurtosis2.255971618
Mean26.04411255
Median Absolute Deviation (MAD)2.71
Skewness0.9052194041
Sum12032.38
Variance17.75510105
2020-08-25T01:48:39.429070image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24.8640.9%
 
26.0940.9%
 
22.0130.6%
 
27.2930.6%
 
21.9430.6%
 
28.430.6%
 
24.730.6%
 
22.5930.6%
 
24.9830.6%
 
25.9930.6%
 
22.5130.6%
 
31.4420.4%
 
30.0120.4%
 
23.6320.4%
 
28.6320.4%
 
25.6320.4%
 
30.3120.4%
 
20.2820.4%
 
27.3620.4%
 
22.6520.4%
 
24.4920.4%
 
29.3820.4%
 
28.0720.4%
 
23.2320.4%
 
23.3720.4%
 
Other values (375)39986.4%
 
ValueCountFrequency (%) 
14.710.2%
 
17.7510.2%
 
17.8110.2%
 
17.8910.2%
 
18.3610.2%
 
18.4610.2%
 
18.510.2%
 
18.7510.2%
 
19.1510.2%
 
19.310.2%
 
ValueCountFrequency (%) 
46.5810.2%
 
45.7210.2%
 
41.7610.2%
 
40.3410.2%
 
38.810.2%
 
37.7110.2%
 
37.4110.2%
 
37.2410.2%
 
36.4610.2%
 
36.0610.2%
 

Alcohol
Real number (ℝ≥0)

ZEROS

Distinct count249
Unique (%)53.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.044393939393935
Minimum0.0
Maximum147.19
Zeros110
Zeros (%)23.8%
Memory size3.7 KiB
2020-08-25T01:48:39.542912image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.51
median7.51
Q323.8925
95-th percentile66.8495
Maximum147.19
Range147.19
Interquartile range (IQR)23.3825

Descriptive statistics

Standard deviation24.48105869
Coefficient of variation (CV)1.43631148
Kurtosis6.421109969
Mean17.04439394
Median Absolute Deviation (MAD)7.51
Skewness2.312698937
Sum7874.51
Variance599.3222347
2020-08-25T01:48:39.661599image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
011023.8%
 
2.06163.5%
 
0.5181.7%
 
11.1151.1%
 
43.251.1%
 
8.2351.1%
 
14.451.1%
 
8.3351.1%
 
4.1140.9%
 
3.8140.9%
 
1.0340.9%
 
12.8640.9%
 
6.1730.6%
 
23.6630.6%
 
2.7830.6%
 
2.4930.6%
 
11.8330.6%
 
10.4930.6%
 
13.3730.6%
 
4.6330.6%
 
21.630.6%
 
18.5130.6%
 
18.7220.4%
 
3.620.4%
 
28.820.4%
 
Other values (224)25154.3%
 
ValueCountFrequency (%) 
011023.8%
 
0.1910.2%
 
0.2610.2%
 
0.3720.4%
 
0.5181.7%
 
0.610.2%
 
0.6820.4%
 
0.6910.2%
 
0.7420.4%
 
0.8610.2%
 
ValueCountFrequency (%) 
147.1910.2%
 
145.2910.2%
 
14410.2%
 
120.0310.2%
 
109.810.2%
 
10810.2%
 
100.3210.2%
 
97.210.2%
 
92.6210.2%
 
90.9310.2%
 

Age
Real number (ℝ≥0)

Distinct count49
Unique (%)10.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.816017316017316
Minimum15
Maximum64
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB
2020-08-25T01:48:39.787266image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile17
Q131
median45
Q355
95-th percentile62
Maximum64
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.60895644
Coefficient of variation (CV)0.3412030675
Kurtosis-1.01622901
Mean42.81601732
Median Absolute Deviation (MAD)12
Skewness-0.3817342585
Sum19781
Variance213.4216084
2020-08-25T01:48:39.891538image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
16204.3%
 
58173.7%
 
17173.7%
 
61163.5%
 
59163.5%
 
55163.5%
 
60153.2%
 
49143.0%
 
53143.0%
 
45143.0%
 
64132.8%
 
38132.8%
 
42132.8%
 
48132.8%
 
40122.6%
 
62122.6%
 
32112.4%
 
46112.4%
 
27112.4%
 
52102.2%
 
54102.2%
 
41102.2%
 
39102.2%
 
3391.9%
 
5691.9%
 
Other values (24)13629.4%
 
ValueCountFrequency (%) 
1530.6%
 
16204.3%
 
17173.7%
 
1881.7%
 
1920.4%
 
2061.3%
 
2130.6%
 
2320.4%
 
2461.3%
 
2540.9%
 
ValueCountFrequency (%) 
64132.8%
 
6381.7%
 
62122.6%
 
61163.5%
 
60153.2%
 
59163.5%
 
58173.7%
 
5781.7%
 
5691.9%
 
55163.5%
 

target
Boolean

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
0
302
1
160
ValueCountFrequency (%) 
030265.4%
 
116034.6%
 

Interactions

2020-08-25T01:48:28.306217image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:28.441535image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:28.576889image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:28.717366image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:28.862056image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.013511image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.156064image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.297759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.432035image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.567744image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.699308image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.834913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:29.979511image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.134696image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.276423image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.416004image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.545185image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.693175image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.828424image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:30.965335image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.112943image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.258927image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.399544image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.553171image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.686323image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.824172image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:31.962310image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.105210image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.245436image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.386094image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.711326image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.853721image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:32.986729image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.129545image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.300634image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.450844image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.604176image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.748401image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:33.889841image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.039287image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.171324image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.316494image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.450468image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.596451image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.737628image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:34.878155image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.013009image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.154516image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.289521image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.436065image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.576071image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.723312image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:35.865983image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.007438image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.148139image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.293411image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.435071image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.570776image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.692906image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.819785image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:36.947042image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:37.262625image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:37.387100image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:37.516090image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:48:40.013406image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:48:40.243900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:48:40.474341image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:48:40.886614image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:48:37.743011image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:48:37.997118image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

SbpTobaccoLdlAdiposityFamhistTypeaObesityAlcoholAgetarget
016012.005.7323.1114925.3097.20521
11440.014.4128.6105528.872.06631
21180.083.4832.2815229.143.81460
31707.506.4138.0315131.9924.26581
413413.603.5027.7816025.9957.34491
51326.206.4736.2116230.7714.14450
61424.053.3816.2005920.812.62380
71144.084.5914.6016223.116.72581
81140.003.8319.4014924.862.49290
91320.005.8030.9616930.110.00531

Last rows

SbpTobaccoLdlAdiposityFamhistTypeaObesityAlcoholAgetarget
4521545.533.2028.8116126.1542.79420
4531241.607.2239.6813631.500.00511
4541460.644.8228.0206028.118.23391
4551282.242.8326.4804823.9647.42271
4561700.404.1142.0615633.102.06570
4572140.405.9831.7206428.450.00580
4581824.204.4132.1005228.6118.72521
4591083.001.5915.2304020.0926.64550
4601185.4011.6130.7906427.3523.97400
4611320.004.8233.4116214.700.00461